Technical Lead
Job ID 2025-9885
Description
About Company:
Headquartered in El Segundo, Calif., Internet Brands® is a fully integrated online media and software services organization focused on four high-value vertical categories: Health, Automotive, Legal, and Home/Travel. The company's award-winning consumer websites lead their categories and serve more than 250 million monthly visitors, while a full range of web presence offerings has established deep, long-term relationships with SMB and enterprise clients. Internet Brands' powerful, proprietary operating platform provides the flexibility and scalability to fuel the company's continued growth. Internet Brands is a portfolio company of KKR and Temasek.
WebMD Health Corp., an Internet Brands Company, is the leading provider of health information services, servingpatients, physicians, health care professionals, employers, and health plans through our public and private onlineportals, mobile platforms, and health-focused publications. The WebMD Health Network includes WebMD Health, Medscape, Jobson Healthcare Information, prIME Oncology, MediQuality, Frontline, QxMD, Vitals Consumer Services, MedicineNet, eMedicineHealth, RxList, OnHealth, Medscape Education, and other owned WebMD sites. WebMD®, Medscape®, CME Circle®, Medpulse®, eMedicine®, MedicineNet®, theheart.org®, and RxList® are among the trademarks of WebMD Health Corp. or its subsidiaries.
About the Role
Internet Brands is seeking a Lead Data Engineer to drive the development and optimization of data pipelines that power our AI-driven Legal content platforms. This role sits within the Content Ingestion function of the AI team, which enables scalable and intelligent ingestion, normalization, and enrichment of legal data across multiple brands and data sources.
As the Lead Data Engineer, you’ll architect, build, and maintain robust data pipelines and systems that support our AI models, fueling Legal content understanding, recommendation systems, and generative AI workflows. You’ll collaborate closely with Data Scientists, AI Engineers, and Product Managers to ensure that high-quality, structured, and compliant data powers every stage of our Legal AI ecosystem.
Key Responsibilities
- Lead the design and development of scalable data ingestion and processing frameworks for structured, semi-structured, and unstructured Legal data (e.g., case law, firm profiles, practice area articles, contracts).
- Oversee the end-to-end content ingestion lifecycle, from source identification and ETL pipeline design to normalization, metadata tagging, and delivery into AI data lakes.
- Collaborate cross-functionally with AI Research, ML Ops, and Legal Product teams to support downstream model training, data labeling, and retrieval-augmented generation (RAG) pipelines.
- Implement data quality and observability frameworks to ensure accuracy, completeness, and freshness of ingested content.
- Evaluate and integrate third-party APIs, crawlers, and ingestion tools for large-scale legal content aggregation and compliance.
- Partner with Legal and Compliance teams to ensure data governance and security standards are met, particularly for sensitive or proprietary data.
- Continuously optimize pipeline performance, cost efficiency, and scalability across cloud environments (e.g., AWS, GCP, or Azure).
Qualifications Required:
- Bachelor’s or Master’s degree in Computer Science, Data Engineering, or a related technical field.
- 5+ years of experience in data engineering.
- Deep expertise with Python, SQL, and modern ETL frameworks (e.g., Airflow, dbt, Dagster, Luigi).
- Strong experience building and maintaining data pipelines.
- Experience with unstructured data ingestion and processing, including text, media, PDFs, and web content.
- Proficiency in data modeling, data quality frameworks, and metadata management.
- Familiarity with ML/AI workflows, such as dataset preparation for LLMs or model fine-tuning.
Preferred:
- Experience in the Legal, Compliance, or Professional Services domain.
- 2+ years in a lead or senior data engineering role
- Familiarity with vector databases (e.g., Pgvector, Pinecone, Weaviate, FAISS) and LLM data preparation (tokenization, chunking, RAG).
- Knowledge of NLP pipelines (e.g., spaCy, Hugging Face Transformers) or document understanding frameworks.
- Hands-on experience with data cataloging and lineage tools (e.g., DataHub, Amundsen).
- Demonstrated leadership in establishing data engineering best practices, CI/CD for data, and team mentoring.
Why Join Internet Brands
- Be part of an AI-first transformation driving innovation across one of the largest Legal information networks.
- Collaborate with cross-functional AI experts in ML, NLP, and content intelligence.
- Work in a culture that values autonomy, experimentation, and technical excellence.